Windsor
Trustformer: A Trusted Federated Transformer
Tadi, Ali Abbasi, Alhadidi, Dima, Rueda, Luis
Transformers, a cornerstone of deep-learning architectures for sequential data, have achieved state-of-the-art results in tasks like Natural Language Processing (NLP). Models such as BERT and GPT-3 exemplify their success and have driven the rise of large language models (LLMs). However, a critical challenge persists: safeguarding the privacy of data used in LLM training. Privacy-preserving techniques like Federated Learning (FL) offer potential solutions, but practical limitations hinder their effectiveness for Transformer training. Two primary issues are (I) the risk of sensitive information leakage due to aggregation methods like FedAvg or FedSGD, and (II) the high communication overhead caused by the large size of Transformer models. This paper introduces a novel FL method that reduces communication overhead while maintaining competitive utility. Our approach avoids sharing full model weights by simulating a global model locally. We apply k-means clustering to each Transformer layer, compute centroids locally, and transmit only these centroids to the server instead of full weights or gradients. To enhance security, we leverage Intel SGX for secure transmission of centroids. Evaluated on a translation task, our method achieves utility comparable to state-of-the-art baselines while significantly reducing communication costs. This provides a more efficient and privacy-preserving FL solution for Transformer models.
Robustness of Structured Data Extraction from In-plane Rotated Documents using Multi-Modal Large Language Models (LLM)
Biswas, Anjanava, Talukdar, Wrick
Multi-modal large language models (LLMs) have shown remarkable performance in various natural language processing tasks, including data extraction from documents. However, the accuracy of these models can be significantly affected by document in-plane rotation, also known as skew, a common issue in real-world scenarios for scanned documents. This study investigates the impact of document skew on the data extraction accuracy of three state-of-the-art multi-modal LLMs: Anthropic Claude V3 Sonnet, GPT-4-Turbo, and Llava:v1.6. We focus on extracting specific entities from synthetically generated sample documents with varying degrees of skewness. The results demonstrate that document skew adversely affects the data extraction accuracy of all the tested LLMs, with the severity of the impact varying across models. We identify the safe in-plane rotation angles (SIPRA) for each model and investigate the effects of skew on model hallucinations. Furthermore, we explore existing skew detection and correction mechanisms and discuss their potential limitations. We propose alternative approaches, including developing new multi-modal architectures that are inherently more robust to document skew and incorporating skewing techniques during the pre-training phase of the models. Additionally, we highlight the need for more comprehensive testing on a wider range of document quality and conditions to fully understand the challenges and opportunities associated with using multi-modal LLMs for information extraction in real-world scenarios.
Phasor-Driven Acceleration for FFT-based CNNs
Reis, Eduardo, Akilan, Thangarajah, Khalid, Mohammed
Recent research in deep learning (DL) has investigated the use of the Fast Fourier Transform (FFT) to accelerate the computations involved in Convolutional Neural Networks (CNNs) by replacing spatial convolution with element-wise multiplications on the spectral domain. These approaches mainly rely on the FFT to reduce the number of operations, which can be further decreased by adopting the Real-Valued FFT. In this paper, we propose using the phasor form, a polar representation of complex numbers, as a more efficient alternative to the traditional approach. The experimental results, evaluated on the CIFAR-10, demonstrate that our method achieves superior speed improvements of up to a factor of 1.376 (average of 1.316) during training and up to 1.390 (average of 1.321) during inference when compared to the traditional rectangular form employed in modern CNN architectures. Similarly, when evaluated on the CIFAR-100, our method achieves superior speed improvements of up to a factor of 1.375 (average of 1.299) during training and up to 1.387 (average of 1.300) during inference. Most importantly, given the modular aspect of our approach, the proposed method can be applied to any existing convolution-based DL model without design changes.
Comparative Analysis of LLaMA and ChatGPT Embeddings for Molecule Embedding
Sadeghi, Shaghayegh, Bui, Alan, Forooghi, Ali, Lu, Jianguo, Ngom, Alioune
Purpose: Large Language Models (LLMs) like ChatGPT and LLaMA are increasingly recognized for their potential in the field of cheminformatics, particularly in interpreting Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs can decode SMILES strings into vector representations, providing a novel approach to understanding chemical graphs. Methods: We investigate the performance of ChatGPT and LLaMA in embedding SMILES strings. Our evaluation focuses on two key applications: molecular property (MP) prediction and drug-drug interaction (DDI) prediction, both essential in drug development and healthcare. Results: We find that SMILES embeddings generated using LLaMA outperform those from ChatGPT in both MP and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to existing methods in both prediction tasks. Conclusion: The application of LLMs in cheminformatics, particularly in utilizing SMILES embeddings, shows significant promise for advancing drug development. This includes improving the prediction of chemical properties and facilitating the drug discovery process. GitHub: https://github.com/sshaghayeghs/LLaMA-VS-ChatGPT
A Comprehensive Study of Vision Transformers in Image Classification Tasks
Khalil, Mahmoud, Khalil, Ahmad, Ngom, Alioune
Image Classification is a fundamental task in the field of computer vision that frequently serves as a benchmark for gauging advancements in Computer Vision. Over the past few years, significant progress has been made in image classification due to the emergence of deep learning. However, challenges still exist, such as modeling fine-grained visual information, high computation costs, the parallelism of the model, and inconsistent evaluation protocols across datasets. In this paper, we conduct a comprehensive survey of existing papers on Vision Transformers for image classification. We first introduce the popular image classification datasets that influenced the design of models. Then, we present Vision Transformers models in chronological order, starting with early attempts at adapting attention mechanism to vision tasks followed by the adoption of vision transformers, as they have demonstrated success in capturing intricate patterns and long-range dependencies within images. Finally, we discuss open problems and shed light on opportunities for image classification to facilitate new research ideas.
Getting Government AI Engineers to Tune into AI Ethics Seen as Challenge - AI Trends
Engineers tend to see things in unambiguous terms, which some may call Black and White terms, such as a choice between right or wrong and good and bad. The consideration of ethics in AI is highly nuanced, with vast gray areas, making it challenging for AI software engineers to apply it in their work. That was a takeaway from a session on the Future of Standards and Ethical AI at the AI World Government conference held in-person and virtually in Alexandria, Va. this week. An overall impression from the conference is that the discussion of AI and ethics is happening in virtually every quarter of AI in the vast enterprise of the federal government, and the consistency of points being made across all these different and independent efforts stood out. "We engineers often think of ethics as a fuzzy thing that no one has really explained," stated Beth-Anne Schuelke-Leech, an associate professor, Engineering Management and Entrepreneurship at the University of Windsor, Ontario, Canada, speaking at the Future of Ethical AI session.
A Multi-Task Learning Framework for COVID-19 Monitoring and Prediction of PPE Demand in Community Health Centres
Molokwu, Bonaventure Chidube, Shuvo, Shaon Bhatta, Kobti, Ziad, Snowdon, Anne
Currently, the world seeks to find appropriate mitigation techniques to control and prevent the spread of the new SARS-CoV-2. In our paper herein, we present a peculiar Multi-Task Learning framework that jointly predicts the effect of SARS-CoV-2 as well as Personal-Protective-Equipment consumption in Community Health Centres for a given populace. Predicting the effect of the virus (SARS-CoV-2), via studies and analyses, enables us to understand the nature of SARS-CoV- 2 with reference to factors that promote its growth and spread. Therefore, these foster widespread awareness; and the populace can become more proactive and cautious so as to mitigate the spread of Corona Virus Disease 2019 (COVID- 19). Furthermore, understanding and predicting the demand for Personal Protective Equipment promotes the efficiency and safety of healthcare workers in Community Health Centres. Owing to the novel nature and strains of SARS-CoV-2, relatively few literature and research exist in this regard. These existing literature have attempted to solve the problem statement(s) using either Agent-based Models, Machine Learning Models, or Mathematical Models. In view of this, our work herein adds to existing literature via modeling our problem statements as Multi- Task Learning problems. Results from our research indicate that government actions and human factors are the most significant determinants that influence the spread of SARS-CoV-2.